Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 2.050
Filter
Add filters

Year range
1.
Progress in Biomedical Optics and Imaging - Proceedings of SPIE ; 12465, 2023.
Article in English | Scopus | ID: covidwho-20245449

ABSTRACT

The coronavirus disease 2019 (COVID-19) pandemic had a major impact on global health and was associated with millions of deaths worldwide. During the pandemic, imaging characteristics of chest X-ray (CXR) and chest computed tomography (CT) played an important role in the screening, diagnosis and monitoring the disease progression. Various studies suggested that quantitative image analysis methods including artificial intelligence and radiomics can greatly boost the value of imaging in the management of COVID-19. However, few studies have explored the use of longitudinal multi-modal medical images with varying visit intervals for outcome prediction in COVID-19 patients. This study aims to explore the potential of longitudinal multimodal radiomics in predicting the outcome of COVID-19 patients by integrating both CXR and CT images with variable visit intervals through deep learning. 2274 patients who underwent CXR and/or CT scans during disease progression were selected for this study. Of these, 946 patients were treated at the University of Pennsylvania Health System (UPHS) and the remaining 1328 patients were acquired at Stony Brook University (SBU) and curated by the Medical Imaging and Data Resource Center (MIDRC). 532 radiomic features were extracted with the Cancer Imaging Phenomics Toolkit (CaPTk) from the lung regions in CXR and CT images at all visits. We employed two commonly used deep learning algorithms to analyze the longitudinal multimodal features, and evaluated the prediction results based on the area under the receiver operating characteristic curve (AUC). Our models achieved testing AUC scores of 0.816 and 0.836, respectively, for the prediction of mortality. © 2023 SPIE.

2.
Sustainability ; 15(11):8924, 2023.
Article in English | ProQuest Central | ID: covidwho-20245432

ABSTRACT

Assessing e-learning readiness is crucial for educational institutions to identify areas in their e-learning systems needing improvement and to develop strategies to enhance students' readiness. This paper presents an effective approach for assessing e-learning readiness by combining the ADKAR model and machine learning-based feature importance identification methods. The motivation behind using machine learning approaches lies in their ability to capture nonlinearity in data and flexibility as data-driven models. This study surveyed faculty members and students in the Economics faculty at Tlemcen University, Algeria, to gather data based on the ADKAR model's five dimensions: awareness, desire, knowledge, ability, and reinforcement. Correlation analysis revealed a significant relationship between all dimensions. Specifically, the pairwise correlation coefficients between readiness and awareness, desire, knowledge, ability, and reinforcement are 0.5233, 0.5983, 0.6374, 0.6645, and 0.3693, respectively. Two machine learning algorithms, random forest (RF) and decision tree (DT), were used to identify the most important ADKAR factors influencing e-learning readiness. In the results, ability and knowledge were consistently identified as the most significant factors, with scores of ability (0.565, 0.514) and knowledge (0.170, 0.251) using RF and DT algorithms, respectively. Additionally, SHapley Additive exPlanations (SHAP) values were used to explore further the impact of each variable on the final prediction, highlighting ability as the most influential factor. These findings suggest that universities should focus on enhancing students' abilities and providing them with the necessary knowledge to increase their readiness for e-learning. This study provides valuable insights into the factors influencing university students' e-learning readiness.

3.
Tien Tzu Hsueh Pao/Acta Electronica Sinica ; 51(1):202-212, 2023.
Article in Chinese | Scopus | ID: covidwho-20245323

ABSTRACT

The COVID-19 (corona virus disease 2019) has caused serious impacts worldwide. Many scholars have done a lot of research on the prevention and control of the epidemic. The diagnosis of COVID-19 by cough is non-contact, low-cost, and easy-access, however, such research is still relatively scarce in China. Mel frequency cepstral coefficients (MFCC) feature can only represent the static sound feature, while the first-order differential MFCC feature can also reflect the dynamic feature of sound. In order to better prevent and treat COVID-19, the paper proposes a dynamic-static dual input deep neural network algorithm for diagnosing COVID-19 by cough. Based on Coswara dataset, cough audio is clipped, MFCC and first-order differential MFCC features are extracted, and a dynamic and static feature dual-input neural network model is trained. The model adopts a statistic pooling layer so that different length of MFCC features can be input. The experiment results show the proposed algorithm can significantly improve the recognition accuracy, recall rate, specificity, and F1-score compared with the existing models. © 2023 Chinese Institute of Electronics. All rights reserved.

4.
Journal of Business & Economic Statistics ; 41(3):846-861, 2023.
Article in English | ProQuest Central | ID: covidwho-20245136

ABSTRACT

This article studies multiple structural breaks in large contemporaneous covariance matrices of high-dimensional time series satisfying an approximate factor model. The breaks in the second-order moment structure of the common components are due to sudden changes in either factor loadings or covariance of latent factors, requiring appropriate transformation of the factor models to facilitate estimation of the (transformed) common factors and factor loadings via the classical principal component analysis. With the estimated factors and idiosyncratic errors, an easy-to-implement CUSUM-based detection technique is introduced to consistently estimate the location and number of breaks and correctly identify whether they originate in the common or idiosyncratic error components. The algorithms of Wild Binary Segmentation for Covariance (WBS-Cov) and Wild Sparsified Binary Segmentation for Covariance (WSBS-Cov) are used to estimate breaks in the common and idiosyncratic error components, respectively. Under some technical conditions, the asymptotic properties of the proposed methodology are derived with near-optimal rates (up to a logarithmic factor) achieved for the estimated breaks. Monte Carlo simulation studies are conducted to examine the finite-sample performance of the developed method and its comparison with other existing approaches. We finally apply our method to study the contemporaneous covariance structure of daily returns of S&P 500 constituents and identify a few breaks including those occurring during the 2007–2008 financial crisis and the recent coronavirus (COVID-19) outbreak. An package "” is provided to implement the proposed algorithms.

5.
Proceedings of SPIE - The International Society for Optical Engineering ; 12592, 2023.
Article in English | Scopus | ID: covidwho-20245093

ABSTRACT

Owing to the impact of COVID-19, the venues for dancers to perform have shifted from the stage to the media. In this study, we focus on the creation of dance videos that allow audiences to feel a sense of excitement without disturbing their awareness of the dance subject and propose a video generation method that links the dance and the scene by utilizing a sound detection method and an object detection algorithm. The generated video was evaluated using the Semantic Differential method, and it was confirmed that the proposed method could transform the original video into an uplifting video without any sense of discomfort. © 2023 SPIE.

6.
Geoscientific Model Development ; 16(11):3313-3334, 2023.
Article in English | ProQuest Central | ID: covidwho-20245068

ABSTRACT

Using climate-optimized flight trajectories is one essential measure to reduce aviation's climate impact. Detailed knowledge of temporal and spatial climate sensitivity for aviation emissions in the atmosphere is required to realize such a climate mitigation measure. The algorithmic Climate Change Functions (aCCFs) represent the basis for such purposes. This paper presents the first version of the Algorithmic Climate Change Function submodel (ACCF 1.0) within the European Centre HAMburg general circulation model (ECHAM) and Modular Earth Submodel System (MESSy) Atmospheric Chemistry (EMAC) model framework. In the ACCF 1.0, we implement a set of aCCFs (version 1.0) to estimate the average temperature response over 20 years (ATR20) resulting from aviation CO2 emissions and non-CO2 impacts, such as NOx emissions (via ozone production and methane destruction), water vapour emissions, and contrail cirrus. While the aCCF concept has been introduced in previous research, here, we publish a consistent set of aCCF formulas in terms of fuel scenario, metric, and efficacy for the first time. In particular, this paper elaborates on contrail aCCF development, which has not been published before. ACCF 1.0 uses the simulated atmospheric conditions at the emission location as input to calculate the ATR20 per unit of fuel burned, per NOx emitted, or per flown kilometre.In this research, we perform quality checks of the ACCF 1.0 outputs in two aspects. Firstly, we compare climatological values calculated by ACCF 1.0 to previous studies. The comparison confirms that in the Northern Hemisphere between 150–300 hPa altitude (flight corridor), the vertical and latitudinal structure of NOx-induced ozone and H2O effects are well represented by the ACCF model output. The NOx-induced methane effects increase towards lower altitudes and higher latitudes, which behaves differently from the existing literature. For contrail cirrus, the climatological pattern of the ACCF model output corresponds with the literature, except that contrail-cirrus aCCF generates values at low altitudes near polar regions, which is caused by the conditions set up for contrail formation. Secondly, we evaluate the reduction of NOx-induced ozone effects through trajectory optimization, employing the tagging chemistry approach (contribution approach to tag species according to their emission categories and to inherit these tags to other species during the subsequent chemical reactions). The simulation results show that climate-optimized trajectories reduce the radiative forcing contribution from aviation NOx-induced ozone compared to cost-optimized trajectories. Finally, we couple the ACCF 1.0 to the air traffic simulation submodel AirTraf version 2.0 and demonstrate the variability of the flight trajectories when the efficacy of individual effects is considered. Based on the 1 d simulation results of a subset of European flights, the total ATR20 of the climate-optimized flights is significantly lower (roughly 50 % less) than that of the cost-optimized flights, with the most considerable contribution from contrail cirrus. The CO2 contribution observed in this study is low compared with the non-CO2 effects, which requires further diagnosis.

7.
Journal of Industrial and Management Optimization ; 19(6):4663, 2023.
Article in English | ProQuest Central | ID: covidwho-20244967

ABSTRACT

Disasters such as earthquakes, typhoons, floods and COVID-19 continue to threaten the lives of people in all countries. In order to cover the basic needs of the victims, emergency logistics should be implemented in time. Location-routing problem (LRP) tackles facility location problem and vehicle routing problem simultaneously to obtain the overall optimization. In response to the shortage of relief materials in the early post-disaster stage, a multi-objective model for the LRP considering fairness is constructed by evaluating the urgency coefficients of all demand points. The objectives are the lowest cost, delivery time and degree of dissatisfaction. Since LRP is a NP-hard problem, a hybrid metaheuristic algorithm of Discrete Particle Swarm Optimization (DPSO) and Harris Hawks Optimization (HHO) is designed to solve the model. In addition, three improvement strategies, namely elite-opposition learning, nonlinear escaping energy, multi-probability random walk, are introduced to enhance its execution efficiency. Finally, the effectiveness and performance of the LRP model and the hybrid metaheuristic algorithm are verified by a case study of COVID-19 in Wuhan. It demonstrates that the hybrid metaheuristic algorithm is more competitive with higher accuracy and the ability to jump out of the local optimum than other metaheuristic algorithms.

8.
Applied Sciences ; 13(11):6515, 2023.
Article in English | ProQuest Central | ID: covidwho-20244877

ABSTRACT

With the advent of the fourth industrial revolution, data-driven decision making has also become an integral part of decision making. At the same time, deep learning is one of the core technologies of the fourth industrial revolution that have become vital in decision making. However, in the era of epidemics and big data, the volume of data has increased dramatically while the sources have become progressively more complex, making data distribution highly susceptible to change. These situations can easily lead to concept drift, which directly affects the effectiveness of prediction models. How to cope with such complex situations and make timely and accurate decisions from multiple perspectives is a challenging research issue. To address this challenge, we summarize concept drift adaptation methods under the deep learning framework, which is beneficial to help decision makers make better decisions and analyze the causes of concept drift. First, we provide an overall introduction to concept drift, including the definition, causes, types, and process of concept drift adaptation methods under the deep learning framework. Second, we summarize concept drift adaptation methods in terms of discriminative learning, generative learning, hybrid learning, and others. For each aspect, we elaborate on the update modes, detection modes, and adaptation drift types of concept drift adaptation methods. In addition, we briefly describe the characteristics and application fields of deep learning algorithms using concept drift adaptation methods. Finally, we summarize common datasets and evaluation metrics and present future directions.

9.
IEEE Aerospace Conference Proceedings ; 2023-March, 2023.
Article in English | Scopus | ID: covidwho-20244833

ABSTRACT

The Double Asteroid Redirection Test (DART) mission is NASA's first planetary defense mission to demonstrate the viability of kinetically impacting an asteroid and deflecting its trajectory. The DART spacecraft successfully launched on November 24, 2021 from the Vandenberg Space Force Base and successfully made impact on Dimorphos, the smaller asteroid in the Didymos system, on September 26, 2022. The DART spacecraft has one instrument called Didymos Reconnaissance and Asteroid Camera for Optical navigation (DRACO). DRACO is an imaging telescope that, in conjunction with the SMART Navigation algorithm, autonomously guided the DART spacecraft to the asteroid. Because DRACO is a mission critical and light sensitive instrument, the DRACO Door mechanism was designed as the protective cover. The door functions to shield DRACO from stray light during launch, to deploy in space once when commanded, and to stay 180 degrees open for the duration of the mission. The DRACO Door went through several iterations during the design phase with decisions on various components such as Frangibolts ®, torsion springs, hardstops, and latches. After fabrication and assembly, the door went through a rigorous environmental testing plan, which included deployment testing, vibration testing, and thermal vacuum testing. After successful qualification of the mechanism, the door was installed and integrated into the DART spacecraft. It should be noted that during the fabrication of the mechanism piece-parts, the COVID-19 pandemic began, and the effects of the pandemic were seen in the challenges faced during the DRACO door assembly and testing. Under the constraints of the pandemic, the DART spacecraft was successfully built, tested, and launched, and the DRACO door was successfully deployed on December 7, 2021. The door has continued to function as intended. This paper will discuss the design choices behind the door components, the environmental qualification test program, and the installation of the door onto the DART spacecraft. In addition, this paper will discuss the lessons learned and the challenges of fabricating and testing the flight hardware. © 2023 IEEE.

10.
Sustainability ; 15(10), 2023.
Article in English | Web of Science | ID: covidwho-20244491

ABSTRACT

Due to the inappropriate or untimely distribution of post-disaster goods, many regions did not receive timely and efficient relief for infected people in the coronavirus disease outbreak that began in 2019. This study develops a model for the emergency relief routing problem (ERRP) to distribute post-disaster relief more reasonably. Unlike general route optimizations, patients' suffering is taken into account in the model, allowing patients in more urgent situations to receive relief operations first. A new metaheuristic algorithm, the hybrid brain storm optimization (HBSO) algorithm, is proposed to deal with the model. The hybrid algorithm adds the ideas of the simulated annealing (SA) algorithm and large neighborhood search (LNS) algorithm into the BSO algorithm, improving its ability to escape from the local optimum trap and speeding up the convergence. In simulation experiments, the BSO algorithm, BSO+LNS algorithm (combining the BSO with the LNS), and HBSO algorithm (combining the BSO with the LNS and SA) are compared. The results of simulation experiments show the following: (1) The HBSO algorithm outperforms its rivals, obtaining a smaller total cost and providing a more stable ability to discover the best solution for the ERRP;(2) the ERRP model can greatly reduce the level of patient suffering and can prioritize patients in more urgent situations.

11.
2022 IEEE 14th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management, HNICEM 2022 ; 2022.
Article in English | Scopus | ID: covidwho-20244294

ABSTRACT

The COVID-19 pandemic has given people much free time. With this, the researchers want to encourage these people to read instead of scrolling through social media. A barrier to reading for many people is not knowing what to read and disinterest in popular books that they would find when they search online. The existing websites that encourage book reading rely on social networking for their recommendations, while the collaborative filtering algorithms applied to books do not exist in the mobile application form. Readwell is a book recommender Android app with a Point-of-Sales System created using Java, Python, and SQLite databases. The information regarding the books was web scraped from the Goodreads website. It aims to apply the more efficient collaborative filtering algorithm to an accessible mobile application that allows users to directly buy the books they are interested in, thus encouraging the reading and buying of books. The researchers created unit test cases to validate the different functionalities of the application. © 2022 IEEE.

12.
2022 IEEE 14th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management, HNICEM 2022 ; 2022.
Article in English | Scopus | ID: covidwho-20244265

ABSTRACT

The COVID-19 pandemic has caused disruption to the economy due to the increasing infection that affects the workforce in different sectors. The Philippine government has imposed lockdowns to control the spread of infection. This urged the different sectors to implement flexible work schedules or work from home setup. A work-from-home (WFH) setup burdens both the employee and employer by installing different equipment set-ups such as WiFi-equipped laptops, computers, tablets, or smartphones. However, the internet stability in some of the areas in the Philippines is not yet reliable. In this study, an application is used collect survey information and provide an estimate of the telework internet cost requirement of a given government employee or a given government employee implementing a work-from-home set up in their respective household. This involves survey results from different respondents who are currently on a work-from-home setup and significant factors from the survey have been analyzed using machine learning (ML) algorithms. Among the machine learning algorithms used, the ensemble bagged trees model outperformed the other ML models. This work can be extended by incorporating a wider scope of datasets from different industry doing work from home set-up. In addition, in terms of education, it is also recommended to determine the WFH set up not just with the government employee and employer but to also extend this into the education side. © 2022 IEEE.

13.
Electronics ; 12(11):2378, 2023.
Article in English | ProQuest Central | ID: covidwho-20244207

ABSTRACT

This paper presents a control system for indoor safety measures using a Faster R-CNN (Region-based Convolutional Neural Network) architecture. The proposed system aims to ensure the safety of occupants in indoor environments by detecting and recognizing potential safety hazards in real time, such as capacity control, social distancing, or mask use. Using deep learning techniques, the system detects these situations to be controlled, notifying the person in charge of the company if any of these are violated. The proposed system was tested in a real teaching environment at Rey Juan Carlos University, using Raspberry Pi 4 as a hardware platform together with an Intel Neural Stick board and a pair of PiCamera RGB (Red Green Blue) cameras to capture images of the environment and a Faster R-CNN architecture to detect and classify objects within the images. To evaluate the performance of the system, a dataset of indoor images was collected and annotated for object detection and classification. The system was trained using this dataset, and its performance was evaluated based on precision, recall, and F1 score. The results show that the proposed system achieved a high level of accuracy in detecting and classifying potential safety hazards in indoor environments. The proposed system includes an efficiently implemented software infrastructure to be launched on a low-cost hardware platform, which is affordable for any company, regardless of size or revenue, and it has the potential to be integrated into existing safety systems in indoor environments such as hospitals, warehouses, and factories, to provide real-time monitoring and alerts for safety hazards. Future work will focus on enhancing the system's robustness and scalability to larger indoor environments with more complex safety hazards.

14.
CEUR Workshop Proceedings ; 3395:337-345, 2022.
Article in English | Scopus | ID: covidwho-20243829

ABSTRACT

The coronavirus outbreak has resulted in unprecedented measures, forcing authorities to make decisions related to establishing lockdowns in areas most affected by the pandemic. Social Media have supported people during this difficult time. On November 9, 2020, when the first vaccine with an efficacy rate over 90% was announced, social media reacted and people around the world began to express their feelings about this vaccination. This paper aims to analyze the dynamics of opinion on COVID-19 vaccination, in which the civil society is highly manifested in the vaccination process. We compared classical machine learning algorithms to select the best performing classifier. 4,392 tweets were collected and analyzed. The proposed approach can help governments create and evaluate appropriate communication tools to provide clear and relevant information to the general public, increasing public confidence in vaccination campaigns. © 2022 Copyright for this paper by its authors.

15.
Proceedings - 2022 2nd International Symposium on Artificial Intelligence and its Application on Media, ISAIAM 2022 ; : 43-47, 2022.
Article in English | Scopus | ID: covidwho-20243436

ABSTRACT

With the upgrading and innovation of the logistics industry, the requirements for the level of transportation smart technologies continue to increase. The outbreak of the COVID-19 has further promoted the development of unmanned transportation machines. Aimed at the requirements of intelligent following and automatic obstacle avoidance of mobile robots in dynamic and complex environments, this paper uses machine vision to realize the visual perception function, and studies the real-time path planning of robots in complicated environment. And this paper proposes the Dijkstra-ant colony optimization (ACO) fusion algorithm, the environment model is established by the link viewable method, the Dijkstra algorithm plans the initial path. The introduction of immune operators improves the ant colony algorithm to optimize the initial path. Finally, the simulation experiment proves that the fusion algorithm has good reliability in a dynamic environment. © 2022 IEEE.

16.
Proceedings - 2022 13th International Congress on Advanced Applied Informatics Winter, IIAI-AAI-Winter 2022 ; : 181-188, 2022.
Article in English | Scopus | ID: covidwho-20243412

ABSTRACT

On social media, misinformation can spread quickly, posing serious problems. Understanding the content and sensitive nature of fake news and misinformation is critical to prevent the damage caused by them. To this end, the characteristics of information must first be discerned. In this paper, we propose a transformer-based hybrid ensemble model to detect misinformation on the Internet. First, false and true news on Covid-19 were analyzed, and various text classification tasks were performed to understand their content. The results were utilized in the proposed hybrid ensemble learning model. Our analysis revealed promising results, establishing the capability of the proposed system to detect misinformation on social media. The final model exhibited an excellent F1 score (0.98) and accuracy (0.97). The AUC (Area Under The Curve) score was also high at 0.98, and the ROC (Receiver Operating Characteristics) curve revealed that the true-positive rate of the data was close to one in this model. Thus, the proposed hybrid model was demonstrated to be successful in recognizing false information online. © 2022 IEEE.

17.
Proceedings - 2022 2nd International Symposium on Artificial Intelligence and its Application on Media, ISAIAM 2022 ; : 197-200, 2022.
Article in English | Scopus | ID: covidwho-20242924

ABSTRACT

With the development and progress of intelligent algorithms, more and more social robots are used to interfere with the information transmission and direction of international public opinion. This paper takes the agenda of COVID-19 in Twitter as the breakthrough point, and through the methods of web crawler, Twitter robot detection, data processing and analysis, aims at the agenda setting of social robots for China issues, that is, to carry out data visualization analysis for the stigmatized China image. Through case analysis, concrete and operable countermeasures for building the international communication system of China image were provided. © 2022 IEEE.

18.
Applied Sciences ; 13(11):6437, 2023.
Article in English | ProQuest Central | ID: covidwho-20242320

ABSTRACT

Physical inactivity is becoming an important threat to public health in today's society. The COVID-19 pandemic has also reduced physical activity (PA) levels given all the restrictions imposed worldwide. In this work, physical activity interventions supported by mobile devices and relying on control engineering principles were proposed. The model was constructed relying on previous studies that consider a fluid analogy of Social Cognitive Theory (SCT), which is a psychological theory that describes how people acquire and maintain certain behaviors, including health-promoting behaviors, through the interplay of personal, environmental, and behavioral factors. The obtained model was validated using secondary data (collected earlier) from a real intervention with a group of male subjects in Great Britain. The present model was extended with new technology for a better understanding of behavior change interventions. This involved the use of applications, such as phone-based ecological momentary assessments, to collect behavioral data and the inclusion of simulations with logical reward conditions for reaching the behavioral threshold. A goal of 10,000 steps per day is recommended due to the significant link observed between higher daily step counts and lower mortality risk. The intervention was designed using a Model Predictive Control (MPC) algorithm configured to obtain a desired performance. The system was tested and validated using simulation scenarios that resemble different situations that may occur in a real setting.

19.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ; 13989 LNCS:703-717, 2023.
Article in English | Scopus | ID: covidwho-20242099

ABSTRACT

Machine learning models can use information from gene expressions in patients to efficiently predict the severity of symptoms for several diseases. Medical experts, however, still need to understand the reasoning behind the predictions before trusting them. In their day-to-day practice, physicians prefer using gene expression profiles, consisting of a discretized subset of all data from gene expressions: in these profiles, genes are typically reported as either over-expressed or under-expressed, using discretization thresholds computed on data from a healthy control group. A discretized profile allows medical experts to quickly categorize patients at a glance. Building on previous works related to the automatic discretization of patient profiles, we present a novel approach that frames the problem as a multi-objective optimization task: on the one hand, after discretization, the medical expert would prefer to have as few different profiles as possible, to be able to classify patients in an intuitive way;on the other hand, the loss of information has to be minimized. Loss of information can be estimated using the performance of a classifier trained on the discretized gene expression levels. We apply one common state-of-the-art evolutionary multi-objective algorithm, NSGA-II, to the discretization of a dataset of COVID-19 patients that developed either mild or severe symptoms. The results show not only that the solutions found by the approach dominate traditional discretization based on statistical analysis and are more generally valid than those obtained through single-objective optimization, but that the candidate Pareto-optimal solutions preserve the sense-making that practitioners find necessary to trust the results. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

20.
Conference Proceedings - IEEE SOUTHEASTCON ; 2023-April:610-617, 2023.
Article in English | Scopus | ID: covidwho-20242090

ABSTRACT

We demonstrate the feasibility of a generalized technique for semantic deduplication in temporal data domains using graph-based representations of data records. Structured data records with multiple timestamp attributes per record may be represented as a directed graph where the nodes represent the events and the edges represent event sequences. Edge weights are based on elapsed time between connecting nodes. In comparing two records, we may merge these directed graphs and determine a representative directed acyclic graph (DAG) inclusive of a subset of nodes and edges that maintain the transitive weights of the original graphs. This DAG may then be evaluated by weighting elapsed time equivalences between records at each node and measuring the fraction of nodes represented in the DAG versus the union of nodes between the records being compared. With this information, we establish a duplication score and use a specified threshold requirement to assert duplication. This method is referred to as Temporal Deduplication using Directed Acyclic Graphs (TD:DAG). TD:DAG significantly outperformed established ASNM and ASNM+LCS methods for datasets rep-resenting two disparate domains, COVID-19 government policy data and PlayStation Network (PSN) trophy data. TD:DAG produced highly effective and comparable F1 scores of 0.960 and 0.972 for the two datasets, respectively, versus 0.864/0.938 for ASNM+LCS and 0.817/0.708 for ASNM. © 2023 IEEE.

SELECTION OF CITATIONS
SEARCH DETAIL